The goals of this exercise are to: - Introduce you to bayesian analysis using revbayes
This file will not knit until all the results from RevBayes are in a folder with it. You can click Run > Run All to run the code up to what you have completed as you step through the exercise.
RevBayes is a very flexible tool for phylogenetic analyses. You have extensive control over which features of your model are constant, deterministic, or stochastic. This allows RevBayes to do a standard Bayesian phylogenetic inference, but it can also do much more including time calibration, ancestral character state reconstruction, and phylogenetic comparative analyses.
This flexibility comes with greater complexity than many other tools.
For example, the models are explicitly specified rather than just called
by name (eg GTR). The configurations are placed in RevBayes script files
that typically end with the file extension .Rev. These
files contain a series of commands written in RevBayes’ own language,
which has a syntax
similar to R. For an introduction to how to specify models and implement
analyses see the documentation on Getting
Started with RevBayes tutorial. For an overview of all the commands
see the documentation.
A typical RevBayes script has commands that load the data, set up the model, define the analysis, run the analysis, and control what is in the output and where it goes. For routine analyses you can modify the basics of existing files, changing the input and output files for example. For more more specialized analyses you can extensively revise and expand these files, or write them from scrap.
There is a series of detailed tutorials that serve as a great starting point for using RevBayes. This exercise is based off of the Nucleotide substitution models tutorial.
RevBayes can be difficult to install because it requires an external library, boost. I suggest using it on a remote cluster.
We will also use two desktop tools to view the results. Both require java, which you will need to install if you don’t have it already.
Install these desktop tools on your computer:
This exercise folder contains:
data directory with the multiple sequence alignment
we will consider, in nexus format..Rev revbayes scripts. Each contains all
the RevBays commands for one analysis.rb, to the
.Rev script that contains the commands for your
analysis.This exercise is a bit different than the others in that you are fallowing another existing tutorial – Nucleotide substitution models. The problems listed here related to that tutorial and the steps you take there.
That tutorial is written as if you are running the commands on your
own computer. We will run it on the cluster. This requires some
deviations from the instructions in the tutorial. Rather than launch the
scripts by opening RevBayes and running a source() command,
you will use the job*.sh slurm scripts to launch revbayes
and run the .Rev scripts. To run the JC analysis, for
example, lanunch the job on the cluster with:
sbatch job_jc.sh
In the interest of time, we will skip the HKY model. Feel free to go through it if you like.
To understand the MCMC sampling, plot some key variables recorded in the log file through generations. Download the output file from the cluster and place it on your computer in the same folder as this file.
You can view the log files with Tracer, or with the code below.
We will plot the summary MCMC stats right here in R. Each replicate is shown as a different color.
trace_jc = read_tsv("revbayes_output/output_jc/primates_cytb_JC.log")
## Rows: 40004 Columns: 49
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (49): Iteration, Replicate_ID, Posterior, Likelihood, Prior, bl[1], bl[2...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
trace_jc %<>% # Read the data
mutate(Replicate_ID = as.factor(Replicate_ID)) # Convert the replicate id to a factor
trace_jc %<>% mutate(Iteration = Iteration %/% length(levels(trace_jc$Replicate_ID))) # The iterations are sequenctial across all replicates. Rescale them so that they are in number of generations per replicate, rather than total number of samples
Now let’s plot Tree length, TL, as in the tutorial:
trace_jc %>%
ggplot(aes(x=Iteration, y=TL, col=Replicate_ID), alpha=0.2) + geom_line()
trace_jc %>%
ggplot(aes(x=Iteration, y=Likelihood, col=Replicate_ID), alpha=0.2) + geom_line()
These plots show all iteration, ie generations. These include the
first MCMC iterations before the run has burned in. By default, the readTreeTrace
that is used in our scripts to import trees to summarize them discards
the phylogenies from the first 25% of iterations as a burn in. Here we
will do the same, and replot:
trace_jc %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(x=Iteration, y=TL, col=Replicate_ID), alpha=0.2) + geom_line()
trace_jc %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(x=Iteration, y=Likelihood, col=Replicate_ID), alpha=0.2) + geom_line()
Now we will also look at the post-burning distribution of TL and Likelihood
trace_jc %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(TL, fill = Replicate_ID, colour = Replicate_ID)) +
geom_density(alpha = 0.1)
trace_jc %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(Likelihood, fill = Replicate_ID, colour = Replicate_ID)) +
geom_density(alpha = 0.1)
Problem 1: (2 points) Answer the following questions about the MCMC summaries for the JC model above.
Open the tree output_jc/primates_cytb_JC_MAP.tree on
your local computer with FigTree to inspect the topology, edge length,
and posterior probabilities. You will need to check the
Node Labels box and then in Display: select
posterior.
Problem 2: (1.5 points) Answer the following questions for the GTR analyses. Copy and paste the codeblocks from above and modify them to import these results and plot them, or use Tracer to view the results. Pick at least one other model paraemter to examine in addition to TL and Likelihood. Note that this analysis has just a single replicate.
trace_gtr = read_tsv("revbayes_output/output_gtr/primates_cytb_GTR.log")
## Rows: 1001 Columns: 58
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (58): Iteration, Posterior, Likelihood, Prior, bl[1], bl[2], bl[3], bl[4...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
trace_gtr %>%
ggplot(aes(x=Iteration, y=TL), alpha=0.2) + geom_line()
trace_gtr %>%
ggplot(aes(x=Iteration, y=Likelihood), alpha=0.2) + geom_line()
trace_gtr %>%
ggplot(aes(x=Iteration, y=Prior), alpha=0.2) + geom_line()
trace_gtr %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(x=Iteration, y=TL), alpha=0.2) + geom_line()
trace_gtr %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(x=Iteration, y=Likelihood), alpha=0.2) + geom_line()
trace_gtr %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(x=Iteration, y=Prior), alpha=0.2) + geom_line()
trace_gtr %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(TL)) +
geom_density(alpha = 0.1)
trace_gtr %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(Likelihood)) +
geom_density(alpha = 0.1)
trace_gtr %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(Prior)) +
geom_density(alpha = 0.1)
Problem 3: (1.5 points) Answer the following questions for the GTR+IG analyses. Copy and paste the codeblocks from above and modify them to import these results and plot them, or use Tracer to view the results. Examine alpha in addition to TL and likelihood. Note that this analysis has just a single replicate.
trace_gtrig = read_tsv("revbayes_output/output_gtrig/primates_cytb_GTRGI.log")
## Rows: 1001 Columns: 64
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (64): Iteration, Posterior, Likelihood, Prior, alpha, bl[1], bl[2], bl[3...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
trace_gtrig %>%
ggplot(aes(x=Iteration, y=TL), alpha=0.2) + geom_line()
trace_gtrig %>%
ggplot(aes(x=Iteration, y=Likelihood), alpha=0.2) + geom_line()
trace_gtrig %>%
ggplot(aes(x=Iteration, y=alpha), alpha=0.2) + geom_line()
trace_gtrig %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(x=Iteration, y=TL), alpha=0.2) + geom_line()
trace_gtrig %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(x=Iteration, y=Likelihood), alpha=0.2) + geom_line()
trace_gtrig %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(x=Iteration, y=alpha), alpha=0.2) + geom_line()
trace_gtrig %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(TL)) +
geom_density(alpha = 0.1)
trace_gtrig %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(Likelihood)) +
geom_density(alpha = 0.1)
trace_gtrig %>%
filter( Iteration > 2500 ) %>%
ggplot(aes(alpha)) +
geom_density(alpha = 0.1)
Make sure you have all files in this folder and have knit the document. Compress the folder with zip, and submit it on canvas.